452 Discussion Discussion
نویسنده
چکیده
Being able to reliably, and automatically, select variables in linear regression models is a notoriously difficult problem. This research attacks this question head on, introducing not only a computationally efficient algorithm and method, LARS (and its derivatives), but at the same time introducing comprehensive theory explaining the intricate details of the procedure as well as theory to guide its practical implementation. This is a fascinating paper and I commend the authors for this important work. Automatic variable selection, the main theme of this paper, has many goals. So before embarking upon a discussion of the paper it is important to first sit down and clearly identify what the objectives are. The authors make it clear in their introduction that, while often the goal in variable selection is to select a " good " linear model, where goodness is measured in terms of prediction accuracy performance, it is also important at the same time to choose models which lean toward the parsimonious side. So here the goals are pretty clear: we want good prediction error performance but also simpler models. These are certainly reasonable objectives and quite justifiable in many scientific settings. At the same, however, one should recognize the difficulty of the task, as the two goals, low prediction error and smaller models, can be diametrically opposed. By this I mean that certainly from an oracle point of view it is true that minimizing prediction error will identify the true model, and thus, by going after prediction error (in a perfect world), we will also get smaller models by default. However, in practice, what happens is that small gains in prediction error often translate into larger models and less dimension reduction. So as procedures get better at reducing prediction error, they can also get worse at picking out variables accurately. Unfortunately, I have some misgivings that LARS might be falling into this trap. Mostly my concern is fueled by the fact that Mallows' C p is the criterion used for determining the optimal LARS model. The use of C p often leads to overfitting, and this coupled with the fact that LARS is a forward optimization procedure, which is often found to be greedy, raises some potential flags. This, by the way, does not necessarily mean that LARS per se is overfitting, but rather that I think C p may be an inappropriate model selection criterion for LARS. It is …
منابع مشابه
The fungus gnats ( Diptera : Bolitophilidae , Keroplatidae , Mycetophilidae ) of Sardinia , with description of six new species *
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Study area . . . . . . . . . . . . . . ....
متن کاملOn the Water Vapor in the Atmosphere over the United States East
PeOe I. Purpose of investigation _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 449 11. Theory of method _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 449 111. The empirical data _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 451 IV. Computation of constants of the equations _ _ _ _ _ _ _ _ _ _ _ 452 V. Discussion of formulae; causes of errors, etc _ _ _ _ _ _ _ _ _ _ _ 46...
متن کاملCross-lingual Flames Detection in News Discussions
We introduce Flames Detector, an online system for measuring flames, i.e. strong negative feelings or emotions, insults or other verbal offences, in news commentaries across five languages. It is designed to assist journalists, public institutions or discussion moderators to detect news topics which evoke flames. We propose a machine learning approach to flames detection and calculate an aggreg...
متن کاملThe Correlation between the Sex of Human
PAGE INTRODUC~ON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Methods.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Data analyzed and results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
متن کاملExploring Aphasia in Kalhori
Objectives: Despite numerous studies conducted to explore the manifestations of aphasia in different languages of the world, language-specific patterns of aphasic patients in Kalhori as a southern dialect of Kurdish spoken in part of Kermanshah Province, Iran, remains largely unpacked. The present study aims at investigating language deficits of a forty-year-old Kurdish-Persian aphasic woman, h...
متن کاملFederal Trade Commission
1 Settlement in this matter precludes the possibility of a litigated record. Thus, the Commission’s understanding of the facts as set forth in this Analysis is based on the record developed during staff’s investigation. The Commission has decided to include discussion of the relevant parts of the investigatory record to provide the best guidance it can on the scope of the state action defense a...
متن کامل